5 research outputs found

    Index compression for information retrielval systems

    Get PDF
    [Abstract] Given the increasing amount of information that is available today, there is a clear need for Information Retrieval (IR) systems that can process this information in an efficient and effective way. Efficient processing means minimising the amount of time and space required to process data, whereas effective processing means identifying accurately which information is relevant to the user and which is not. Traditionally, efficiency and effectiveness are at opposite ends (what is beneficial to efficiency is usually harmful to effectiveness, and vice versa), so the challenge of IR systems is to find a compromise between efficient and effective data processing. This thesis investigates the efficiency of IR systems. It suggests several novel strategies that can render IR systems more efficient by reducing the index size of IR systems, referred to as index compression. The index is the data structure that stores the information handled in the retrieval process. Two different approaches are proposed for index compression, namely document reordering and static index pruning. Both of these approaches exploit document collection characteristics in order to reduce the size of indexes, either by reassigning the document identifiers in the collection in the index, or by selectively discarding information that is less relevant to the retrieval process by pruning the index. The index compression strategies proposed in this thesis can be grouped into two categories: (i) Strategies which extend state of the art in the field of efficiency methods in novel ways. (ii) Strategies which are derived from properties pertaining to the effectiveness of IR systems; these are novel strategies, because they are derived from effectiveness as opposed to efficiency principles, and also because they show that efficiency and effectiveness can be successfully combined for retrieval. The main contributions of this work are in indicating principled extensions of state of the art in index compression, and also in suggesting novel theoretically-driven index compression techniques which are derived from principles of IR effectiveness. All these techniques are evaluated extensively, in thorough experiments involving established datasets and baselines, which allow for a straight-forward comparison with state of the art. Moreover, the optimality of the proposed approaches is addressed from a theoretical perspective.[Resumen] Dada la creciente cantidad de información disponible hoy en día, existe una clara necesidad de sistemas de Recuperación de Información (RI) que sean capaces de procesar esa información de una manera efectiva y eficiente. En este contexto, eficiente significa cantidad de tiempo y espacio requeridos para procesar datos, mientras que efectivo significa identificar de una manera precisa qué información es relevante para el usuario y cual no lo es. Tradicionalmente, eficiencia y efectividad se encuentran en polos opuestos - lo que es beneficioso para la eficiencia, normalmente perjudica la efectividad y viceversa - así que un reto para los sistemas de RI es encontrar un compromiso adecuado entre el procesamiento efectivo y eficiente de los datos. Esta tesis investiga el problema de la eficiencia de los sistemas de RI. Sugiere diferentes estrategias novedosas que pueden permitir la reducción de los índices de los sistemas de RI, enmarcadas dentro da las técnicas conocidas como compresión de índices. El índice es la estructura de datos que almacena la información utilizada en el proceso de recuperación. Se presentan dos aproximaciones diferentes para la compresión de los índices, referidas como reordenación de documentos y pruneado estático del índice. Ambas aproximaciones explotan características de colecciones de documentos para reducir el tamaño final de los índices, mediante la reasignación de los identificadores de los documentos de la colección o bien descartando selectivamente la información que es "menos relevante" para el proceso de recuperación. Las estrategias de compresión propuestas en este tesis se pueden agrupar en dos categorías: (i) estrategias que extienden el estado del arte en la eficiencia de una manera novedosa y (ii) estrategias derivadas de propiedades relacionadas con los principios de la efectividad en los sistemas de RI; estas estrategias son novedosas porque son derivadas desde principios de la efectividad como contraposición a los de la eficiencia, e porque revelan como la eficiencia y la efectividad pueden ser combinadas de una manera efectiva para la recuperación de información. Las contribuciones de esta tesis abarcan la elaboración de técnicas del estado del arte en compresión de índices y también en la derivación de técnicas de compresión basadas en fundamentos teóricos derivados de los principios de la efectividad de los sistemas de RI. Todas estas técnicas han sido evaluadas extensamente con numerosos experimentos que involucran conjuntos de datos y técnicas de referencia bien establecidas en el campo, las cuales permiten una comparación directa con el estado del arte. Finalmente, la optimalidad de las aproximaciones presentadas es tratada desde una perspectiva teórica

    The First Post-Kepler Brightness Dips of KIC 8462852

    Get PDF
    We present a photometric detection of the first brightness dips of the unique variable star KIC 8462852 since the end of the Kepler space mission in 2013 May. Our regular photometric surveillance started in October 2015, and a sequence of dipping began in 2017 May continuing on through the end of 2017, when the star was no longer visible from Earth. We distinguish four main 1-2.5% dips, named "Elsie," "Celeste," "Skara Brae," and "Angkor", which persist on timescales from several days to weeks. Our main results so far are: (i) there are no apparent changes of the stellar spectrum or polarization during the dips; (ii) the multiband photometry of the dips shows differential reddening favoring non-grey extinction. Therefore, our data are inconsistent with dip models that invoke optically thick material, but rather they are in-line with predictions for an occulter consisting primarily of ordinary dust, where much of the material must be optically thin with a size scale <<1um, and may also be consistent with models invoking variations intrinsic to the stellar photosphere. Notably, our data do not place constraints on the color of the longer-term "secular" dimming, which may be caused by independent processes, or probe different regimes of a single process

    Detection of new drivers of frequent B-cell lymphoid neoplasms using an integrated analysis of whole genomes.

    Get PDF
    B-cell lymphoproliferative disorders exhibit a diverse spectrum of diagnostic entities with heterogeneous behaviour. Multiple efforts have focused on the determination of the genomic drivers of B-cell lymphoma subtypes. In the meantime, the aggregation of diverse tumors in pan-cancer genomic studies has become a useful tool to detect new driver genes, while enabling the comparison of mutational patterns across tumors. Here we present an integrated analysis of 354 B-cell lymphoid disorders. 112 recurrently mutated genes were discovered, of which KMT2D, CREBBP, IGLL5 and BCL2 were the most frequent, and 31 genes were putative new drivers. Mutations in CREBBP, TNFRSF14 and KMT2D predominated in follicular lymphoma, whereas those in BTG2, HTA-A and PIM1 were more frequent in diffuse large B-cell lymphoma. Additionally, we discovered 31 significantly mutated protein networks, reinforcing the role of genes such as CREBBP, EEF1A1, STAT6, GNA13 and TP53, but also pointing towards a myriad of infrequent players in lymphomagenesis. Finally, we report aberrant expression of oncogenes and tumor suppressors associated with novel noncoding mutations (DTX1 and S1PR2), and new recurrent copy number aberrations affecting immune check-point regulators (CD83, PVR) and B-cell specific genes (TNFRSF13C). Our analysis expands the number of mutational drivers of B-cell lymphoid neoplasms, and identifies several differential somatic events between disease subtypes

    Characteristics and predictors of death among 4035 consecutively hospitalized patients with COVID-19 in Spain

    No full text
    corecore